TODO1: check with Dana that I am using the correct fit_event_jack functions across the different scenarios
Citations
Please use the following citations when using this website:
Goin DE, Riddell CA (2022). Comparing two-way fixed effects and new estimators for difference-in-differences: A simulation study and empirical example. Pre-print. ADD LINK.
Riddell CA, Goin DE (2022). Guide for comparing estimators of policy change effects on health. Pre-print. ADD LINK.
Bug reports
Submit any bug reports to: c.riddell [AT] berkeley [DOT] edu or open an issue on Github.
In this companion guide, we illustrate scenarios in which difference in differences (DID) analysis can be applied. We assume readers know what a DID design is and have some background knowledge in policy analysis. Reviewing the manuscript by Goin and Riddell and its references is a great place to start if this guide leaves you with more questions than answers.
We start with the simplest of difference in differences scenarios and slowly amp up the complexity. For each scenario we describe a target parameter to estimate and the parameter estimated by a two-way fixed effects (TWFE) model. We also apply the Goodman-Bacon decomposition to this parameter to determine if the TWFE estimate is partially composed of estimates that are “forbidden” (e.g., ones that compare a newly treated state to a previously treated state) and apply alternate estimation approaches.
The goal is to illustrate scenarios where the usual TWFE method of estimation provides suitable results and scenarios where this approach is biased. In the latter case, we show alternative methods to estimation to overcome the issue.
In the following examples, the effect of state policy changes on a health outcome is the effect of interest.
To get a local version of the data and code used for this analysis, you can run the following code within RStudio. This will download a local copy of all the files contained in the GitHub repository “Guide-to-DID-estimators” on Corinne Riddell’s GitHub.
install.packages("usethis")
usethis::use_course("corinne-riddell/Guide-to-DID-estimators")
By default, this creates a new folder on your desktop (though you can
specify a directory using the destdir argument in the
use_course() function.)
The files will open in RStudio when you run the code. In the future, you can open the Guide-to-DID-estimators.Rproj file by double clicking it or by using the Select “File” > “Open Project” within RStudio and choose the Guide-to-DID-estimators project.
First, load the packages for reading and plotting the data. The
packages bacondecomp,
did, and staggered are specific to DID
analyses.
# you will need to install these packages if you don't have them already
# install.packages(c("here", "readxl", "tidyverse", "patchwork", "magrittr",
# "broom", "ggrepel", "lubridate", "bacondecomp", "staggered", "devtools"))
# devtools::install_github("bcallaway11/did")
# as of July 11, 2022, package `did` is currently unavailable on CRAN, so you
# need to install the developer version from GitHub, using the code above.
# You may also need to download the developer version of `DRDID` before if you
# get an error. To do so use:
# devtools::install_github("pedrohcgs/DRDID")
library(here) #nice file paths
library(readxl) #read in excel data
library(tidyverse) #collection of packages for data science
library(patchwork) #"stiches" together ggplots
library(magrittr) #pipes
library(broom) #tidy displays
library(ggrepel) #for labelling points on a ggplot
library(lubridate) #helps with date objects
library(bacondecomp) #Goodman Bacon Decomposition
library(did) #Callaway and Sant'Anna estimator for DID
library(staggered) #Sun and Abraham estimator for DID
The target trial estimator was implemented using code revised from
the creator’s GitHub
repository. We use source() to read in the functions
here. This code will only work for you if you downloaded the files using
the use_course() function in the previous step.:
source("./helper_func_ed.R")
source("./helper_func_ed_sum.R")
source("./helper_func_ed_sum_hte.R")
The first file is a slightly edited version of the original helper_funcs.R file found in Ben-Michael’s linked repository. The other helper functions are ones Dana Goin made to aggregate the estimates under different settings.
Since the packages used for these analyses are relatively new, we
anticipate them to change over time and that these changes may affect
the readers ability to reproduce the results. For these analyses we used
bacondecomp version 0.1.1, staggered version
1.1 and did version 2.2.0.901. If your results differ or
you get an error running any of the code that uses functions from those
packages, then it may be because you are using different versions of the
packages.
packageVersion("bacondecomp")
## [1] '0.1.1'
packageVersion("staggered")
## [1] '1.1'
packageVersion("did")
## [1] '2.2.0.901'
We first consider the simplest case in which there is one state that never adopts the policy (the never-treated group), and one state that does implement the policy (the treated group). The groups are observed at two time periods only.
Below, the data are read in from an Excel spreadsheet. You can open
the spreadsheet in Excel to view it if that comes naturally to you, or,
use an R View()er window after you have imported it. The
data contain four variables state, time,
policy, and outcome.
state: ID for the grouping variabletime: Time index, begins at 1policy: Indicator variable for exposure to the policy
changeoutcome: The outcome variables1 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen1",
col_types = c("text", "text", "text", "numeric"))
# str(s1)
# View(s1)
Note that I specified the column types for each variable. I read in the state, time, and policy variables as “text” so that R will consider them as categorical/factor variables in the analysis (or, using econ-speak, as “fixed effects”).
The following can be read from the labelled plot:
Three standard assumptions are typical stated in DID papers and required for the DID estimate to estimate the causal effect of interest:
How is 3) different from 1)? Seems like if something other than the policy changing affected the outcome the trends wouldn’t be parallel
If you’re familiar with causal inference theory, you will also know about the main assumptions required when estimating causal effects. Ben-Michael et al. list these in their paper, and we include them here for reference:
Consistency: There are no multiple versions of treatment that are unknown to the investigator. For example, if states introduced a texting ban while driving and these bans were associated with different penalties, then the policy change would not meet the consistency assumption because the different penalties may have varied effects on the outcome and the investigator is analyzing these policies as though they were the same. However, if the investigator knows that the penalties differed and estimated the effects accounting for this heterogeneity, then the consistency assumption would not be violated.
No interference: Policies that affect individuals in one state should not affect the outcomes of individuals in other states.
Exchangeability: Had the treated states been untreated, they would have experienced the trend in outcomes experienced by the untreated group. This causal assumption is the same as the parallel trends assumption. [DANA DO YOU AGREE?] Yes, although I think we should also say that the potential outcomes are independent of the treatment received.
Positivity: Each state has a non-zero probability of being treated. According to Ben-Michael et al, this assumption is not required for standard DID models.
Correct model specification: SOMETHING ABOUT LINEARITY? ALSO RELATED TO THE PARALLEL TRENDS ASSUMPTION
Do you think we need this if we have the rest of the assumptions? I’m trying to think what this covers that isn’t already listed and can’t think of anything
Under the canonical TWFE design, we can estimate the policy effect by including a fixed effect (FE) for state, a FE for time, and an indicator for the policy change. The indicator should be 1 for when the treated states are in the post-treatment period, and 0 otherwise. The effect estimate is the regression coefficient on the policy variable.
s1_mod <- lm(outcome ~ state + time + policy, data = s1)
tidy(s1_mod)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9 NaN NaN NaN
## 2 state2 1.00 NaN NaN NaN
## 3 time2 3 NaN NaN NaN
## 4 policy1 2.00 NaN NaN NaN
The coefficient on the policy term is 2 showing that the regression model estimate equalled the causal effect.
Note: that in this scenario and the other simple
scenarios we cover, there is no variability in the data, so the model
cannot estimate a standard error (as indicated by the NaN
in the regression output). In a later scenario, we add noise to the data
and show how to estimate the standard error.
Takeaway: When there are only two treatment groups and two time points, where one of the groups becomes treated, you can calculate the DID estimate by hand or using TWFE regression.
This scenario is similar to Scenario 1, except now there are two states that undergo treatment. In Scenario 2A, the magnitude of the treatment effect is the same in both states. In Scenario 2B, the magnitude of the treatment effect varies by state.
s2a <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen2a",
col_types = c("text", "text", "text", "numeric"))
s2b <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen2b",
col_types = c("text", "text", "text", "numeric"))
s2a %<>% mutate(ever_trt = case_when(state %in% c("2", "3") ~ "treated",
state == "1" ~ "never-treated"))
s2b %<>% mutate(ever_trt = case_when(state %in% c("2", "3") ~ "treated",
state == "1" ~ "never-treated"))
The following can be read from the labelled plot for Scenario 2A:
The following can be read from the labelled plot for Scenario 2B:
s2a_mod <- lm(outcome ~ policy + state + time, data = s2a)
tidy(s2a_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 5 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9 0 Inf 0
## 2 policy1 2.00 0 Inf 0
## 3 state2 1.00 0 Inf 0
## 4 state3 2.00 0 Inf 0
## 5 time2 3 0 Inf 0
The coefficient estimate for the policy variable equals to 2, which is the ATT for Scenario 2a.
s2b_mod <- lm(outcome ~ policy + state + time, data = s2b)
tidy(s2b_mod)
## # A tibble: 5 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9 0.500 18.0 0.0353
## 2 policy1 2.5 0.866 2.89 0.212
## 3 state2 0.750 0.661 1.13 0.460
## 4 state3 2.25 0.661 3.40 0.182
## 5 time2 3 0.707 4.24 0.147
The coefficient estimate for the policy variable equals to 2.5, which is the ATT for Scenario 2b.
Takeaway: When there are multiple treatment groups and two time points, where >1 of the groups becomes treated, you can calculate the DID estimate by hand or using TWFE regression. If treatment effects are heterogeneous across states, then the estimated effect is the average ATT across the states.
In Scenario 3, the number of time periods is increased to incorporate 5 time points before and after a policy change. One never-treated and one treated group are considered. In Scenario 3A, the effect of treatment is constant once treatment is introduced. In Scenario 3B, the effect of treatment increases with time (i.e., it is dynamic).
Because there are multiple time points, when we read in the data we update the column type for the time variable to be numeric. This will help with plotting the data.
s3a <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen3a",
col_types = c("text", "numeric", "text", "numeric", "text"))
s3b <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen3b",
col_types = c("text", "numeric", "text", "numeric", "text"))
In addition to state, time,
policy, and outcome, the data contains another
variable time_since_policy which equals 0 before treatment
(and always equals 0 for the never-treated), and counts the time since
treatment in the treated state.
The following can be read from the labelled plot for Scenario 3A:
The following can be read from the labelled plot for Scenario 3B:
For Scenario 3B the effect is dynamic, and rather than calculating an ATT that aggregates over post-time, we may want to calculate the effect separately by time since treatment. Here, the effect increases over time. This could happen if the policy change takes time to “kick-in” (so it might level off after more time passes) or if there is a positive feedback loop.
In contrast, some policies may be associated with large initial effects that attenuate over time. Knowing if a policy introduction is associated with a changing effect over time is therefore important for understanding the real-world effects of the policy.
Thus, in this scenario we first estimate the average treatment effect using a TWFE model, and then we estimate the dynamic effect by extending the TWFE model to include a categorical indicator for time since the policy’s introduction.
Estimation of the dynamic effect in Scenario 3A. Different from the
previous scenarios, we need to use factor(time) to ensure
time is modeled using indicator variables (i.e., time fixed effects)
since we imported it as a numeric variable in this scenario to make
plotting easier.
s3a_mod <- lm(outcome ~ policy + state + factor(time),
data = s3a)
tidy(s3a_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 12 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9.00 4.39e-15 2.05e15 3.62e-120
## 2 policy1 2.00 5.07e-15 3.94e14 1.92e-114
## 3 state2 2.00 3.59e-15 5.57e14 1.20e-115
## 4 factor(time)2 3.00 5.67e-15 5.29e14 1.83e-115
## 5 factor(time)3 6.00 5.67e-15 1.06e15 7.16e-118
## 6 factor(time)4 9 5.67e-15 1.59e15 2.79e-119
## 7 factor(time)5 12 5.67e-15 2.12e15 2.80e-120
## 8 factor(time)6 15 6.21e-15 2.41e15 9.73e-121
## 9 factor(time)7 18 6.21e-15 2.90e15 2.26e-121
## 10 factor(time)8 21 6.21e-15 3.38e15 6.59e-122
## 11 factor(time)9 24 6.21e-15 3.86e15 2.26e-122
## 12 factor(time)10 27 6.21e-15 4.34e15 8.83e-123
The coefficient of the policy term for scenario 3A is 2 showing that the regression model estimate equaled the causal effect.
Estimation of the dynamic effect in Scenario 3B:
s3b_mod <- lm(outcome ~ policy + state + factor(time),
data = s3b)
tidy(s3b_mod)
## # A tibble: 12 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9.00 1.22 7.35 0.0000801
## 2 policy1 6 1.41 4.24 0.00283
## 3 state2 2.00 1.00 2.00 0.0805
## 4 factor(time)2 3.00 1.58 1.90 0.0943
## 5 factor(time)3 6.00 1.58 3.79 0.00528
## 6 factor(time)4 9.00 1.58 5.69 0.000459
## 7 factor(time)5 12 1.58 7.59 0.0000637
## 8 factor(time)6 13.0 1.73 7.51 0.0000689
## 9 factor(time)7 17 1.73 9.81 0.00000976
## 10 factor(time)8 21 1.73 12.1 0.00000198
## 11 factor(time)9 25.0 1.73 14.4 0.000000519
## 12 factor(time)10 29.0 1.73 16.7 0.000000164
The coefficient of the policy term for scenario 3B is 6 showing that the regression model estimate equaled the causal effect.
time_since_policy specificationRather than using a binary indicator for the policy change, it can be
coded using the time_since_policy variable. If we include
this as a factor variable in the model, then separate effects will be
estimated for each time since treatment.
For Scenario 3A, using this variable should yield 2 for each time since treatment, since the treatment effect is constant over time.
s3a_mod2 <- lm(outcome ~ time_since_policy + state + factor(time),
data = s3a)
tidy(s3a_mod2)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9.00 7.57e-15 1.19e15 3.00e-60
## 2 time_since_policy1 2.00 1.51e-14 1.32e14 1.97e-56
## 3 time_since_policy2 2.00 1.51e-14 1.32e14 1.97e-56
## 4 time_since_policy3 2.00 1.51e-14 1.32e14 1.97e-56
## 5 time_since_policy4 2.00 1.51e-14 1.32e14 1.97e-56
## 6 time_since_policy5 2.00 1.51e-14 1.32e14 1.97e-56
## 7 state2 2.00 6.18e-15 3.24e14 5.47e-58
## 8 factor(time)2 3.00 9.77e-15 3.07e14 6.76e-58
## 9 factor(time)3 6.00 9.77e-15 6.14e14 4.22e-59
## 10 factor(time)4 9 9.77e-15 9.21e14 8.34e-60
## 11 factor(time)5 12 9.77e-15 1.23e15 2.64e-60
## 12 factor(time)6 15.0 1.24e-14 1.21e15 2.77e-60
## 13 factor(time)7 18.0 1.24e-14 1.46e15 1.33e-60
## 14 factor(time)8 21.0 1.24e-14 1.70e15 7.20e-61
## 15 factor(time)9 24 1.24e-14 1.94e15 4.22e-61
## 16 factor(time)10 27.0 1.24e-14 2.18e15 2.64e-61
Indeed, this is what we see.
For Scenario 3B, the effect increases by 2 units for every unit of time since treatment is initiated so this should be reflected in the coefficient estimates.
s3b_mod2 <- lm(outcome ~ time_since_policy + state + factor(time),
data = s3b)
tidy(s3b_mod2)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9.00 1.05e-14 8.56e14 1.12e-59
## 2 time_since_policy1 2.00 2.10e-14 9.51e13 7.33e-56
## 3 time_since_policy2 4.00 2.10e-14 1.90e14 4.58e-57
## 4 time_since_policy3 6.00 2.10e-14 2.85e14 9.05e-58
## 5 time_since_policy4 8.00 2.10e-14 3.80e14 2.86e-58
## 6 time_since_policy5 10.0 2.10e-14 4.76e14 1.17e-58
## 7 state2 2.00 8.58e-15 2.33e14 2.04e-57
## 8 factor(time)2 3.00 1.36e-14 2.21e14 2.51e-57
## 9 factor(time)3 6.00 1.36e-14 4.42e14 1.57e-58
## 10 factor(time)4 9.00 1.36e-14 6.63e14 3.10e-59
## 11 factor(time)5 12.0 1.36e-14 8.84e14 9.82e-60
## 12 factor(time)6 15.0 1.72e-14 8.74e14 1.03e-59
## 13 factor(time)7 18.0 1.72e-14 1.05e15 4.97e-60
## 14 factor(time)8 21.0 1.72e-14 1.22e15 2.68e-60
## 15 factor(time)9 24 1.72e-14 1.40e15 1.57e-60
## 16 factor(time)10 27.0 1.72e-14 1.57e15 9.81e-61
For this scenario, the effect estimate is 2 in the first time after treatment, then 4, and so on.
Note that we could have modeled time_since_policy
linearly (e.g., by including a numeric variable for
time_since_policy). Here, we modeled
time_since_policy as a factor variable, which allows the
effect estimates to change non-linearly over time.
Takeaway: When there is one treated group and one control group, with multiple pre- and post- time points, you can calculate the DID estimate using TWFE regression when the effect is constant or dynamic. Using a “time since treatment” indicator will estimate effects separately by time since the policy change, allowing you to see how the effect has changed over time.
Scenario 4 extends Scenario 3 to the multiple group setting. State 1 is never-treated, while states 2-4 undergo treatment at time=6. The effects are not dynamic in time, but they are heterogeneous with some states having a larger treatment effect.
Note there are two new columns time_as_date and
time_first_trt_date that we will use later in this
example.
s4 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen4",
col_types = c("text", "numeric", "text", "numeric", "text", "numeric", "date", "date"))
The following can be read from the labelled plot:
Pre-post difference for never-treated state 1: 30-15 = 15
Pre-post difference for treated state 2: 34-17 = 17
Pre-post difference for treated state 3: 35-19 = 16
Pre-post difference for treated state 4: 39-21 = 18
\(DID_{2 vs 1} = 17-15 = 2\)
\(DID_{3 vs 1} = 16-15 = 1\)
\(DID_{4 vs 1} = 18-15 = 3\)
\(\text{Average ATT} = (2 + 1 + 3)/3 = 2\)
s4_mod <- lm(outcome ~ policy + state + factor(time),
data = s4)
tidy(s4_mod)
## # A tibble: 14 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9.00 0.277 32.4 1.45e-22
## 2 policy1 2.00 0.320 6.24 1.31e- 6
## 3 state2 2.00 0.253 7.90 2.24e- 8
## 4 state3 3.50 0.253 13.8 1.71e-13
## 5 state4 6.50 0.253 25.7 5.34e-20
## 6 factor(time)2 3.00 0.310 9.67 4.20e-10
## 7 factor(time)3 6.00 0.310 19.3 5.83e-17
## 8 factor(time)4 9 0.310 29.0 2.44e-21
## 9 factor(time)5 12 0.310 38.7 1.63e-24
## 10 factor(time)6 15.0 0.392 38.2 2.20e-24
## 11 factor(time)7 18.0 0.392 45.9 2.06e-26
## 12 factor(time)8 21.0 0.392 53.5 3.89e-28
## 13 factor(time)9 24.0 0.392 61.2 1.24e-29
## 14 factor(time)10 27.0 0.392 68.8 5.91e-31
The coefficient of the policy term is 2 showing that the regression model estimates the ATT.
In this scenario and the ones considered before it we have not needed
to use the new estimators because the TWFE uncover the true effect –
this is the case because the policy change was introduced at one time
point in each scenario. Below, we show how to use the Goodman-Bacon
decomposition function bacon(), to show how its results in
the case where the standard DID approach works so that you can contrast
this with later scenarios where the standard approach fails.
Before using the bacon() function, we need to create
numeric-coded versions of some of the variables that are stored as
characters:
s4 %<>% mutate(state_n = as.numeric(as.character(state)),
policy_n = as.numeric(as.character(policy)))
The bacon function has an argument called
formula, where you list the outcome as a function of the
policy change, i.e., formula = outcome ~ policy_n. Note
that no fixed effects for state or time are included in the formula.
Instead, the time and state fixedeffects are specified by the
time_varand id_var arguments.
s4_bacon <- bacon(formula = outcome ~ policy_n,
data = s4,
id_var = "state_n",
time_var = "time")
## type weight avg_est
## 1 Treated vs Untreated 1 2
This small table illustrates that 100% of the weight is put on comparisons between treated and untreated states and the ATT equals 2.
For more detail we print the object:
s4_bacon
## treated untreated estimate weight type
## 2 6 99999 2 1 Treated vs Untreated
treated=6 says that all treated states start treatment
at time = 6 and are compared to untreated states (with a fictitious
treatment time of 99999).
The results from the Goodman-Bacon Decomposition are reassuring and
tell us we can trust the TWFE estimate. For pedagodgical purposes, we
also calculate the Group-Time ATT estimator using the
att_gt function.
Like bacon(), att_gt() requires all numeric
variables. It also requires an argument called gname.
gname expects a variable which encodes for each state the
time of first policy change. For never-treated state 1, this variable
equals 0 and for treated states 2-4 this variable equals 6.
Is there a reason why you used notyettreated rather than nevertreated for the control group here?
there is an error when try to use nevertreated: Error in pre_process_did… never treated group is too small, try setting control_group to notyettreated
s4_cs <- att_gt(yname = "outcome",
tname = "time",
idname = "state_n",
gname = "time_first_treat",
data = s4,
control_group = "notyettreated",
anticipation = 0)
## No pre-treatment periods to test
m2_ag <- aggte(s4_cs, type="simple")
summary(m2_ag)
##
## Call:
## aggte(MP = s4_cs, type = "simple")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## ATT Std. Error [ 95% Conf. Int.]
## 2 0.9884 0.0628 3.9372 *
##
##
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
Here, we see the ATT is equal to 2 as estimated by the TWFE model.
We can also calculate the estimate using the Target Trial Estimator
introduced by Ben-Michael, Feller, and Stuart. We employ slightly edited
versions of the code that these authors provide with their published
article. The code calculates estimates of the effect for every event
time (also called time since treatment). We built upon this code to
further aggregate across event times to produce one overall estimate
using the fit_event_jack_sum() and
fit_event_jack_sum_hte() functions. Use
fit_event_jack_sum() when there is only one adoption
cohort, as shown here.
s4_bm <- fit_event_jack_sum(outcome_var = "outcome",
date_var = "time_as_date",
unit_var = "state",
policy_var = "time_first_trt_date",
data = s4,
max_time_to = 10000)
## Dropping 1
## Dropping 2
## Dropping 3
## Dropping 4
s4_bm %<>% mutate(lb = estimate - 1.96*se,
ub = estimate + 1.96*se)
s4_bm
## se estimate lb ub
## 1 NaN 2 NaN NaN
The ATT is equal to the effect estimated by the TWFE and Group-Time estimators.
Ideally, we would also show you how to run the Cohort ATT estimator
for this scenario but the function to do so,
staggered_sa(), throws an error when trying to estimate the
standard error. Thus, we will show you how to use this function in the
scenario that incorporates noise into the data.
Scenario 5 introduces staggered timing. In this example, the treated states introduce the policy at two different time points. The treatment effects are non-dynamic and there are two never-treated states and two ever-treated states.
s5 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen5",
col_types = c("text", "numeric", "text", "numeric", "text", "numeric",
"date", "date"))
Visualization of all comparisons made by TWFE regression
Contrast 1:
Contrast 2:
Contrast 3:
Contrast 4:
Since the causal effect equals 3 across all contrasts, and all contrasts are valid, this is the quantity that we want the TWFE and new estimators to estimate.
s5_mod <- lm(outcome ~ policy + state + factor(time), data = s5)
tidy(s5_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 24 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9.00 1.65e-14 5.46e14 0
## 2 policy1 3.00 1.31e-14 2.30e14 0
## 3 state2 1.00 9.43e-15 1.06e14 0
## 4 state3 3.00 1.41e-14 2.13e14 0
## 5 state4 9.00 1.11e-14 8.10e14 0
## 6 factor(time)2 3.00 2.11e-14 1.42e14 0
## 7 factor(time)3 6.00 2.11e-14 2.85e14 0
## 8 factor(time)4 9.00 2.11e-14 4.27e14 0
## 9 factor(time)5 12 2.13e-14 5.63e14 0
## 10 factor(time)6 15.0 2.13e-14 7.03e14 0
## # … with 14 more rows
The coefficient of the policy term is 3 showing that the regression model estimates the ATT.
Using the bacon()function, we can see see the four
comparisons being made and the effect estimates for each comparison. The
weight column in the output shows how much weight each estimate
contributes to the ATT. Here, the treated vs untreated comparisons are
the most heavily weighted.
s5 %<>% mutate(state_n = as.numeric(as.character(state)),
policy_n = as.numeric(as.character(policy)))
#bacon decomp says no problems
#only comparing treated vs untreated and the average effect estimate is 2
s5_bacon <- bacon(outcome ~ policy_n,
data = s5,
id_var = "state_n",
time_var = "time")
## type weight avg_est
## 1 Earlier vs Later Treated 0.06715 3
## 2 Later vs Earlier Treated 0.15108 3
## 3 Treated vs Untreated 0.78177 3
We can also see a list of each comparison and its weight:
s5_bacon
## treated untreated estimate weight type
## 2 5 99999 3 0.30695444 Treated vs Untreated
## 3 12 99999 3 0.47482014 Treated vs Untreated
## 6 12 5 3 0.15107914 Later vs Earlier Treated
## 8 5 12 3 0.06714628 Earlier vs Later Treated
For pedagogical purposes, we also show the estimate if we applied the
Callaway Sant’Anna estimator using the att_gt() function to
estimate group-time ATTs.
s5_cs <- att_gt(yname = "outcome",
tname = "time",
idname = "state_n",
gname = "time_first_treat",
data = s5,
control_group = "notyettreated",
anticipation = 0)
## No pre-treatment periods to test
Note that I suppressed the warnings by using
warning=FALSE in the R markdown chunk header for this
specific chunk. This is because it prints more than 100 lines of the
same warning “## Warning in max(abs(b/bSigma), na.rm = TRUE): no
non-missing arguments to max; ## returning -Inf”. This warning is
because there is no variability in the the data so the model cannot
estimate a standard error. This is not realistic, and in a later example
we will add in variability. I will suppress the warnings in this
document whenever they are too numerous. In practice, you will want to
heed warnings and understand why they occur before suppressing them.
s5_cs_ag <- aggte(s5_cs, type="simple")
summary(s5_cs_ag)
##
## Call:
## aggte(MP = s5_cs, type = "simple")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## ATT Std. Error [ 95% Conf. Int.]
## 3 NA NA NA
##
##
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
Here, we see the ATT is equal to 3 as estimated by the TWFE model.
We can also calculate the estimate using the Target Trial Estimator. Here we use fit_event_jack_sum() because the effect is not hetergeneous.
s5_bm <- fit_event_jack_sum(outcome_var = "outcome",
date_var = "time_as_date",
unit_var = "state",
policy_var = "time_first_trt_date",
data = s5,
max_time_to = 10000)
## Dropping 1
## Dropping 2
## Dropping 3
## Dropping 4
s5_bm %<>% mutate(lb = estimate - 1.96*se,
ub = estimate + 1.96*se)
s5_bm
## se estimate lb ub
## 1 0 3 3 3
This method also estimates an effect of 3, the true estimate, and the same as the other estimators.
Takeaway: When a policy’s introduction is staggered in time and the treatment effect is constant (not dynamic or heterogeneous), TWFE can be used to estimate the treatment effect.
Scenario 6 is similar to Scenario 5 except now the treatment effect increases as time goes on, i.e., it is dynamic. The magnitude of the treatment effect is the same for both treated groups.
s6 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen6",
col_types = c("text", "numeric", "text", "numeric", "text", "numeric",
"date", "date"))
Visualization of all comparisons made by TWFE regression
Contrast 1:
Contrast 2:
Contrast 3:
Contrast 4:
Contrasts 1 and 2 are okay because they are clean comparisons between never-treated and treated states. Contrasts 3 and 4 are trickier. Contrast 3 compares the earlier treated state to the later treated state. It is okay because parallel trends holds and it only uses the time before the later state is treated to inform the estimation. Contrast 4 is the problem – it uses post-treatment time from state 3 as the “pre” time for the comparison. The issue is that this post time includes the effects of the treatment, and since the effect is dynamic it does not satisfy the parallel trends assumption. However this contrast still contributes to the TWFE regression estimate. To see this, we first calculate the TWFE regression estimate and then use the Goodman Bacon decomposition to see how much weight each contrast contributes to the TWFE estimate.
I think we will want to include an estimate of the truth here. This depends on what parameter we’re interested in, but I think the average dynamic effect is the one that makes the most sense.
s6_mod <- lm(outcome ~ policy + state + factor(time),
data = s6)
tidy(s6_mod)
## # A tibble: 24 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 8.24 1.28 6.44 2.90e- 8
## 2 policy1 6.80 1.01 6.72 1.02e- 8
## 3 state2 1.00 0.731 1.37 1.77e- 1
## 4 state3 5.96 1.09 5.46 1.11e- 6
## 5 state4 9.09 0.861 10.6 6.23e-15
## 6 factor(time)2 3.00 1.63 1.84 7.17e- 2
## 7 factor(time)3 6.00 1.63 3.67 5.41e- 4
## 8 factor(time)4 9 1.63 5.51 9.51e- 7
## 9 factor(time)5 11.1 1.65 6.68 1.16e- 8
## 10 factor(time)6 14.3 1.65 8.65 6.76e-12
## # … with 14 more rows
The coefficient of the policy term equals 6.8. We can now use the Goodman Bacon decomposition to see how much weight each of the contrasts contributes to the effect estimate:
s6 %<>% mutate(state_n = as.numeric(as.character(state)),
policy_n = as.numeric(as.character(policy)))
s6_bacon <- bacon(outcome ~ policy_n,
data = s6,
id_var = "state_n",
time_var = "time")
## type weight avg_est
## 1 Earlier vs Later Treated 0.06715 6.00000
## 2 Later vs Earlier Treated 0.15108 -1.00000
## 3 Treated vs Untreated 0.78177 8.37423
It is more helpful to view the four contrasts separately:
s6_bacon
## treated untreated estimate weight type
## 2 5 99999 10.5 0.30695444 Treated vs Untreated
## 3 12 99999 7.0 0.47482014 Treated vs Untreated
## 6 12 5 -1.0 0.15107914 Later vs Earlier Treated
## 8 5 12 6.0 0.06714628 Earlier vs Later Treated
We can see that the estimates of the treatment effect for each contrast are as we calculated them earlier. Note that 15% of the weight of the TWFE estimate is on Contrast 4 – the one making the improper comparison between an earlier treated and a later treated state.
We can also confirm that you get the TWFE estimate by taking the weighted average of the Goodman-Bacon decomposition components:
#see the TWFE estimate
sum(s6_bacon$estimate*s6_bacon$weight)
## [1] 6.798561
The decomposition shows that the TWFE regression estimate is biased because it incorporates a contrast that we would not want to make in practice between an earlier treated and a later treated group. To overcome this, we use one of the new estimators. Consider the Group-Time ATT to start.
Like in the earlier examples we start by running the
att_gt() function. Unlike before, we display the output
from this step using summary(). This output shows the
estimated ATT for each combination of treated state and time. The time
is long because it includes estimates for pre-policy time, which helps
with the evaluation of the parallel trends assumption or to see if there
are any lead effects (“anticipation”) of the treatment on the outcome.
You wouldn’t usually report this entire table, but it worth showing here
to see how the estimated effect is dynamic and that decisions need to be
make about how to report dynamics effects including whether or not to
aggregate the effect, and if so, at what level.
s6_cs <- att_gt(yname = "outcome",
tname = "time",
idname = "state_n",
gname = "time_first_treat",
data = s6,
control_group = "notyettreated",
anticipation = 0)
## No pre-treatment periods to test
summary(s6_cs)
##
## Call:
## att_gt(yname = "outcome", tname = "time", idname = "state_n",
## gname = "time_first_treat", data = s6, control_group = "notyettreated",
## anticipation = 0)
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
## Group-Time Average Treatment Effects:
## Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
## 5 2 0 NA NA NA
## 5 3 0 NA NA NA
## 5 4 0 NA NA NA
## 5 5 3 NA NA NA
## 5 6 4 NA NA NA
## 5 7 5 NA NA NA
## 5 8 6 NA NA NA
## 5 9 7 NA NA NA
## 5 10 8 NA NA NA
## 5 11 9 NA NA NA
## 5 12 10 NA NA NA
## 5 13 11 NA NA NA
## 5 14 12 NA NA NA
## 5 15 13 NA NA NA
## 5 16 14 NA NA NA
## 5 17 15 NA NA NA
## 5 18 16 NA NA NA
## 5 19 17 NA NA NA
## 5 20 18 NA NA NA
## 12 2 0 NA NA NA
## 12 3 0 NA NA NA
## 12 4 0 NA NA NA
## 12 5 0 NA NA NA
## 12 6 0 NA NA NA
## 12 7 0 NA NA NA
## 12 8 0 NA NA NA
## 12 9 0 NA NA NA
## 12 10 0 NA NA NA
## 12 11 0 NA NA NA
## 12 12 3 NA NA NA
## 12 13 4 NA NA NA
## 12 14 5 NA NA NA
## 12 15 6 NA NA NA
## 12 16 7 NA NA NA
## 12 17 8 NA NA NA
## 12 18 9 NA NA NA
## 12 19 10 NA NA NA
## 12 20 11 NA NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
Callaway and Sant’Anna provide many options for aggregating the
group-time effects. The simplest option is specify
type = simple in the attge() function. This
estimate considered only the contrasts with a never-treated state (i.e.,
Comparisons 1 and 2 in the figure above) and combines them into a
weighted average, where the weights correspond to each adoption cohort’s
time spent in the post-treatment period. For contrast 1, there are 16
time periods post treatment and for contrast 2, there are 9 periods
post-treatment. Thus the weighted average is: \(10.5\times(16/25) + 7\times(9/25)=\)
9.24.
#aggregate the group time average treatment effects
#type = "simple": weighted average of all group-time average treatment effects
#with weights proportional to the group size
s6_cs_ag <- aggte(s6_cs, type="simple")
summary(s6_cs_ag)
##
## Call:
## aggte(MP = s6_cs, type = "simple")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## ATT Std. Error [ 95% Conf. Int.]
## 9.24 1.1956 6.8967 11.5833 *
##
##
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
Alternatively, we can specify type = "group" to get
separate effect estimates according to time of implementation. Note that
group denotes the time of the policy’s introduction for the
treated states.
s6_cs_ag2 <- aggte(s6_cs, type = "group")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous critival value is NA. This probably happened because we
## cannot compute t-statistic (std errors are NA). We then report pointwise conf.
## intervals.
summary(s6_cs_ag2)
##
## Call:
## aggte(MP = s6_cs, type = "group")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on group/cohort aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 8.75 NA NA NA
##
##
## Group Effects:
## Group Estimate Std. Error [95% Pointwise Conf. Band]
## 5 10.5 NA NA NA
## 12 7.0 NA NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
The output also displays an estimate of the overall ATT, equal to
8.75. This estimate is different from the one estimated
where type = "simple". Here, the estimate is a weighted
average, with weights proportional to the number of units in each
adoption cohort. Thus, the researcher may want to estimate the ATT using
this method if they prefer these weights over the weights specified by
the previous model.
You can also estimate the dynamic effect separately for each time since the treatment was introduced. Remember that this effect estimation is also done for pre-treatment time, so don’t be surprised by all the rows in the outputted table!
s6_cs_ag3 <- aggte(s6_cs, type="dynamic")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous critival value is NA. This probably happened because we
## cannot compute t-statistic (std errors are NA). We then report pointwise conf.
## intervals.
summary(s6_cs_ag3)
##
## Call:
## aggte(MP = s6_cs, type = "dynamic")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on event-study/dynamic aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 10.5 NA NA NA
##
##
## Dynamic Effects:
## Event time Estimate Std. Error [95% Pointwise Conf. Band]
## -10 0 NA NA NA
## -9 0 NA NA NA
## -8 0 NA NA NA
## -7 0 NA NA NA
## -6 0 NA NA NA
## -5 0 NA NA NA
## -4 0 NA NA NA
## -3 0 NA NA NA
## -2 0 NA NA NA
## -1 0 NA NA NA
## 0 3 NA NA NA
## 1 4 NA NA NA
## 2 5 NA NA NA
## 3 6 NA NA NA
## 4 7 NA NA NA
## 5 8 NA NA NA
## 6 9 NA NA NA
## 7 10 NA NA NA
## 8 11 NA NA NA
## 9 12 NA NA NA
## 10 13 NA NA NA
## 11 14 NA NA NA
## 12 15 NA NA NA
## 13 16 NA NA NA
## 14 17 NA NA NA
## 15 18 NA NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
Again, the Overall ATT effect estimate is different from the other two. Here, it is a simple average of all effects in the post-treatment time. This is equal to \((3 + 4 + ... + 18)/16 = 10.5\)
Going back to the dynamic effect estimation, it is very helpful to
plot the estimates over time using the ggdid()
function:
ggdid(s6_cs_ag3)
Lastly, you can use type = "calendar" to aggregate the
effects over calendar time. Epidemiologically, this doesn’t make a lot
of sense with dynamic effects because it is mixing effect estimates
across difference times since treatment was introduced. In some cases,
only one comparison is contributing to the estimation (e.g., in calendar
time periods where only one group has introduced the treatment), while
at other points, two comparisons contribute. But this “knowledge” is
lost in the presentation. We don’t show the output from using
type = "calendar" but include the code below in case it
makes sense for your setting.
#calendar=time specific
#not recommended for this setting...but Callaway and Sant'Anna say they prefer
#it here https://bcallaway11.github.io/did/articles/did-basics.html
s6_cs_ag4 <- aggte(s6_cs, type = "calendar")
summary(s6_cs_ag4)
ggdid(s6_cs_ag4)
We can also estimate the effects using the Target Trial estimator.
This time, we start with the fit_event_jack() function
which gives an estimate for each time since treatment. This estimator
yields the same estimates as the Group Time ATT approach.
s6_bm <- fit_event_jack(outcome_var = "outcome",
date_var = "time_as_date",
unit_var = "state",
policy_var = "time_first_trt_date",
data = s6,
max_time_to = 10e7)
## Dropping 1
## Dropping 2
## Dropping 3
## Dropping 4
## `summarise()` has grouped output by 'cohort'. You can override using the
## `.groups` argument.
## Joining, by = c("cohort", "event_time")
s6_bm_ave <- s6_bm %>% filter(cohort == "average")
s6_bm_ave %<>% mutate(lb = estimate - 1.96*se,
ub = estimate + 1.96*se)
ggplot(s6_bm_ave, aes(x = event_time, y = estimate)) +
geom_hline(yintercept = 0, lty = 2) +
geom_point(aes(col = event_time >= 0)) +
geom_linerange(aes(ymin = lb, ymax = ub, col = event_time >= 0)) +
labs(y = "Estimate", x = "Event time") +
theme_bw() +
scale_color_discrete(labels=c('pre', 'post')) +
theme(legend.title=element_blank(), legend.position = "bottom")
To summarize the dynamic effects into one aggregated effect estimate we use the fit_event_jack_sum() function:
s6_bm2 <- fit_event_jack_sum(outcome_var = "outcome",
date_var = "time_as_date",
unit_var = "state",
policy_var = "time_first_trt_date",
data = s6,
max_time_to = 10e7)
## Dropping 1
## Dropping 2
## Dropping 3
## Dropping 4
s6_bm2
## se estimate
## 1 2.625 10.5
The aggregated effect estimate equals 10.5, equivalent to the Overall
ATT from the Group-Time estimator when
type = "dynamic".
time_since_policy specificationBut what if we model TWFE with the time_since_change
indicators?
s6_mod2 <- lm(outcome ~ time_since_policy + state + factor(time),
data = s6)
tidy(s6_mod2) %>% filter(str_detect(term, "time_since")) %>%
mutate(time = as.numeric(gsub("time_since_policy", "", term))) %>%
arrange(time)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 6
## term estimate std.error statistic p.value time
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 time_since_policy1 3.00 2.72e-14 1.10e14 0 1
## 2 time_since_policy2 4.00 2.72e-14 1.47e14 0 2
## 3 time_since_policy3 5.00 2.77e-14 1.80e14 0 3
## 4 time_since_policy4 6.00 2.77e-14 2.16e14 0 4
## 5 time_since_policy5 7.00 2.77e-14 2.52e14 0 5
## 6 time_since_policy6 8.00 2.77e-14 2.88e14 0 6
## 7 time_since_policy7 9.00 2.77e-14 3.24e14 0 7
## 8 time_since_policy8 10.0 2.87e-14 3.48e14 0 8
## 9 time_since_policy9 11.0 2.87e-14 3.83e14 0 9
## 10 time_since_policy10 12.0 4.00e-14 3.00e14 0 10
## 11 time_since_policy11 13.0 4.00e-14 3.25e14 0 11
## 12 time_since_policy12 14.0 4.00e-14 3.50e14 0 12
## 13 time_since_policy13 15.0 4.00e-14 3.75e14 0 13
## 14 time_since_policy14 16.0 4.00e-14 4.00e14 0 14
## 15 time_since_policy15 17.0 4.03e-14 4.22e14 0 15
## 16 time_since_policy16 18.0 4.03e-14 4.47e14 0 16
The regression model still works! The estimates from the policy
indicator variables equal those estimated by the Group-Time estimator
when specified using type = "dynamic" and the Target Trial
approach.
Takeway: When the treatment effect is staggered and
dynamic, you can still capture the effect estimate using a TWFE model so
long as the policy effect is modeled using the
time since treatment variable. The key question for the
researcher is identifying the parameter of interest – are you interested
in estimating a dynamic effect, or does a parameter that summarizes over
treatment time make sense?
I added the recommendation below We can consider moving it to the end
Recommendation: When there are multiple time periods, start by estimating a dynamic effect to see if the model supports its presence (i.e., does the effect change over time or is it constant?). If the effect appears dynamic, this is important because it is indicative of how the policy operates after being rolled out. If there is no strong support for a dynamic effect, then consider estimating the effects separately by timing of introduction if that is sensible for the policy change under study (i.e., if there are only a few separate time points), or estimating one overall summary parameter.
Scenario 7 is like Scenario 6 except now the treatment effect is dynamic and heterogeneous. In this scenario, the state that implements the policy change first has a stronger treatment effect (i.e., a steeper change in slope).
s7 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen7",
col_types = c("text", "numeric", "text", "numeric", "text", "numeric",
"date", "date"))
Visualization of all comparisons made by TWFE regression
Contrast 1:
Contrast 2:
Contrast 3:
Contrast 4:
Again I think here it would be helpful to present a truth estimate, so the reader can see how contrast 4 biases the estimate
Similar to the previous scenario, parallel trends is satisfied for contrasts 1-3, so we can be okay with these contrasts contributing to the average effect estimate. Like last time, contrast 4 is the problem, and here the problem is intensified because the “control” state – which was treated in an earlier period – is undergoing a large treatment effect in the “pre” period (times 5-11). This makes the pre-post difference in the control state larger than the pre-post difference in the comparison state – because the control state is still experiencing a stronger (i.e., steeper slope) treatment effect than the state that is treated at time 12. This leads to a DID estimate for this group and time being relatively large, but negative, even though the true effect is to increase the outcome.
Let’s take a look at the regression output:
s7_mod <- lm(outcome ~ policy + state + factor(time),
data = s7)
tidy(s7_mod)
## # A tibble: 24 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 6.98 2.76 2.53 0.0143
## 2 policy1 6.84 2.18 3.13 0.00276
## 3 state2 1.00 1.58 0.634 0.529
## 4 state3 11.9 2.35 5.07 0.00000472
## 5 state4 8.17 1.86 4.40 0.0000496
## 6 factor(time)2 3.00 3.53 0.851 0.399
## 7 factor(time)3 6.00 3.53 1.70 0.0944
## 8 factor(time)4 9.00 3.53 2.55 0.0135
## 9 factor(time)5 11.0 3.57 3.09 0.00308
## 10 factor(time)6 14.5 3.57 4.07 0.000147
## # … with 14 more rows
The coefficient of the policy term equals 6.84.
Here we wrestle with the idea of the truth too. I think we should discuss this earlier and make clear how the investigator will need to define this for their question of interest
The question is, does this estimate come close to what it should be? Well, above we calculated effect estimates for four separate contrasts, where the fourth contrast did not satisfy the parallel trends assumption, but the other three contrasts did. So if I were summarizing across these clean contrasts I would want to make a summary estimate across the DID estimates of 18, 5, and 9. One question is, how much weight do we put on each of the estimates? Another question is, do we consider the DID estimate of 9 comparing earlier to later treated states? It isn’t straightforward what the estimate should be when summarizing across several contrasts. Let’s consider the Goodman Bacon decomposition to see how much each comparison contributed to the TWFE estimate:
s7 %<>% mutate(state_n = as.numeric(as.character(state)),
policy_n = as.numeric(as.character(policy)))
s7_bacon <- bacon(outcome ~ policy_n,
data = s7,
id_var = "state_n",
time_var = "time")
## type weight avg_est
## 1 Earlier vs Later Treated 0.06715 9.00000
## 2 Later vs Earlier Treated 0.15108 -11.00000
## 3 Treated vs Untreated 0.78177 10.10429
s7_bacon
## treated untreated estimate weight type
## 2 5 99999 18 0.30695444 Treated vs Untreated
## 3 12 99999 5 0.47482014 Treated vs Untreated
## 6 12 5 -11 0.15107914 Later vs Earlier Treated
## 8 5 12 9 0.06714628 Earlier vs Later Treated
Notice that our fourth contrast (here shown in row 3 of the second table) contributes 15% of the weight to the estimate. We want an estimator that does not use this contrast!
Below we confirm the weighted estimate across the rows equals the TWFE estimate from the regression:
#see the TWFE estimate
sum(s7_bacon$estimate*s7_bacon$weight)
## [1] 6.841727
The TWFE estimator is influenced by the forbidden fourth contrast that doesn’t meet the parallel trends assumption, so we’d like an estimator that doesn’t suffer from this concern.
Now that we know how to use the Callaway and Sant’Anna code, we can
apply it to this example as well. Let’s start by running the
att_gt() function. This time we won’t display the output
from summary(), but feel free to uncomment that line in the
R markdown document to take a peak:
s7_cs <- att_gt(yname = "outcome",
tname = "time",
idname = "state_n",
gname = "time_first_treat",
data = s7,
control_group = "notyettreated",
anticipation = 0)
## No pre-treatment periods to test
#summary(s7_cs)
Let’s aggregate the ATTs using type = simple. Callaway
and Sant’Anna’s estimate considers only the clean contrasts of 1 and 2
and combines them by weighting by the total time in the post-treatment
period. For contrast 1, there are 16 time periods post treatment and for
contrast 2, there are 9 periods post-treatment. Thus this estimate is
the weighted average: \(18\times(16/25) +
5\times(9/25)=\) 13.32
#aggregate the group time average treatment effects
#type = "simple": weighted average of all group-time average treatment effects
#with weights proportional to the group size
s7_cs_ag <- aggte(s7_cs, type="simple")
summary(s7_cs_ag)
##
## Call:
## aggte(MP = s7_cs, type = "simple")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## ATT Std. Error [ 95% Conf. Int.]
## 13.32 8.8814 -4.0872 30.7272
##
##
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
Alternatively, we can specify type = "group" to get
separate effect estimates according to time of implementation and see
that these match our by-hand calculations:
s7_cs_ag2 <- aggte(s7_cs, type = "group")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous critival value is NA. This probably happened because we
## cannot compute t-statistic (std errors are NA). We then report pointwise conf.
## intervals.
summary(s7_cs_ag2)
##
## Call:
## aggte(MP = s7_cs, type = "group")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on group/cohort aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 11.5 NA NA NA
##
##
## Group Effects:
## Group Estimate Std. Error [95% Pointwise Conf. Band]
## 5 18 NA NA NA
## 12 5 NA NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
[Note here that the Overall ATT is a simple average of the two groups (e.g., (18 + 5)/2 = 11.5)]
Alternatively, you can estimate the dynamic effect separately for each time since the treatment has been introduced. Remember that this effect estimation is also done for pre-treatment time, so don’t be alarmed by all the rows in this table!
s7_cs_ag3 <- aggte(s7_cs, type="dynamic")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e
## = min_e, : Simultaneous conf. band is somehow smaller than pointwise one
## using normal approximation. Since this is unusual, we are reporting pointwise
## confidence intervals
summary(s7_cs_ag3)
##
## Call:
## aggte(MP = s7_cs, type = "dynamic")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on event-study/dynamic aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 16.3125 1.2509 13.8607 18.7643 *
##
##
## Dynamic Effects:
## Event time Estimate Std. Error [95% Pointwise Conf. Band]
## -10 0.0 NA NA NA
## -9 0.0 NA NA NA
## -8 0.0 NA NA NA
## -7 0.0 NA NA NA
## -6 0.0 NA NA NA
## -5 0.0 NA NA NA
## -4 0.0 NA NA NA
## -3 0.0 NA NA NA
## -2 0.0 NA NA NA
## -1 0.0 NA NA NA
## 0 2.0 0.7413 0.5471 3.4529 *
## 1 3.5 1.1120 1.3206 5.6794 *
## 2 5.0 1.4826 2.0942 7.9058 *
## 3 6.5 NA NA NA
## 4 8.0 2.2239 3.6412 12.3588 *
## 5 9.5 2.5946 4.4148 14.5852 *
## 6 11.0 NA NA NA
## 7 12.5 NA NA NA
## 8 14.0 3.7065 6.7354 21.2646 *
## 9 21.0 NA NA NA
## 10 23.0 NA NA NA
## 11 25.0 NA NA NA
## 12 27.0 NA NA NA
## 13 29.0 NA NA NA
## 14 31.0 NA NA NA
## 15 33.0 NA NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
(Note here that the Overall ATT is a simple average of all effects in the post-treatment time. This is equal to (2 + 3.5 + 5 + 6.5 + … + 33)/16=16.31)
Because the treatment effect is heterogeneous (the later treated state has a smaller treatment effect than the earlier treated state) and because there is less observed treated time for the later state, this means that the treatment effect estimates for event times 0 through 8 combine across the two treated states but for times 9 and over those effect estimates are based only on the first treated state. Without knowing this, it looks like the treatment effect “jumps” between event times 8 and 9 and that there is a slope increase, but this is an artifact based on combining across states with heterogeneous effects with different lengths of observed treatment times.
We can also plot the effect estimates over time and see this jump and slope increase between event times 8 and 9. Note also the confidence interval estimates that occurred for times 0 through 8 since there were two data points that could be used to estimate standard errors.
ggdid(s7_cs_ag3)
Here again we don’t display the results using
type = "calendar" but have included the code here in case
you want to run it.
s7_cs_ag4 <- aggte(s7_cs, type = "calendar")
summary(s7_cs_ag4)
ggdid(s7_cs_ag4)
CR write up this section
s7_bm <- fit_event_jack(outcome_var = "outcome",
date_var = "time_as_date",
unit_var = "state",
policy_var = "time_first_trt_date",
data = s7,
max_time_to = 10e7)
## Dropping 1
## Dropping 2
## Dropping 3
## Dropping 4
## `summarise()` has grouped output by 'cohort'. You can override using the
## `.groups` argument.
## Joining, by = c("cohort", "event_time")
s7_bm_ave <- s7_bm %>% filter(cohort == "average")
s7_bm_ave %<>% mutate(lb = estimate - 1.96*se,
ub = estimate + 1.96*se)
ggplot(s7_bm_ave, aes(x = event_time, y = estimate)) +
geom_hline(yintercept = 0, lty = 2) +
geom_point(aes(col = event_time >= 0)) +
geom_linerange(aes(ymin = lb, ymax = ub, col = event_time >= 0)) +
labs(y = "Estimate", x = "Event time") +
theme_bw() +
scale_color_discrete(labels=c('pre', 'post')) +
theme(legend.title=element_blank(), legend.position = "bottom")
time_since_policy specificationBut what if we run the TWFE model with the
time_since_change indicators? Here, I run the regression
and filter() out the policy estimates for easy viewing
since the regression table is so large:
s7_mod2 <- lm(outcome ~ time_since_policy + state + factor(time),
data = s7)
tidy(s7_mod2) %>% filter(str_detect(term, "time_since")) %>%
mutate(time = as.numeric(gsub("time_since_policy", "", term))) %>%
arrange(time)
## # A tibble: 16 × 6
## term estimate std.error statistic p.value time
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 time_since_policy1 0.621 1.13 0.547 5.87e- 1 1
## 2 time_since_policy2 2.06 1.13 1.81 7.73e- 2 2
## 3 time_since_policy3 4.11 1.16 3.56 9.55e- 4 3
## 4 time_since_policy4 5.64 1.16 4.88 1.63e- 5 4
## 5 time_since_policy5 7.17 1.16 6.21 2.19e- 7 5
## 6 time_since_policy6 8.70 1.16 7.53 2.96e- 9 6
## 7 time_since_policy7 10.2 1.16 8.85 4.59e-11 7
## 8 time_since_policy8 11.7 1.20 9.77 2.89e-12 8
## 9 time_since_policy9 13.3 1.20 11.1 6.07e-14 9
## 10 time_since_policy10 18.7 1.67 11.2 5.00e-14 10
## 11 time_since_policy11 20.8 1.67 12.5 1.48e-15 11
## 12 time_since_policy12 23.0 1.67 13.8 5.41e-17 12
## 13 time_since_policy13 25.2 1.67 15.1 2.41e-18 13
## 14 time_since_policy14 27.4 1.67 16.4 1.29e-19 14
## 15 time_since_policy15 29.5 1.68 17.6 1.04e-20 15
## 16 time_since_policy16 31.7 1.68 18.9 7.43e-22 16
The effect estimates on the policy indicators do not equal an average of the ATTs from the two states for each time period. For this example, this specification gives estimates that are systematically lower than those produced by the Group-Time estimator and the Target Trial estimator.
Takeway: When the treatment effect is staggered, dynamic, and heterogeneous, the TWFE regression specification will produced biased results, as will the specification using a categorical variable for time since treatment. In this case, it is best to use one of the newer estimators.
So far, we have considered toy examples where the outcomes didn’t incorporate noise. This was great for getting started, but doesn’t reflect the imprecision in estimates based on real-world data. Furthermore, we couldn’t estimate standard errors because there was not error in the estimation.
This scenario is more realistic. It increases the number of treated and control units, as is common in analyses across several geographic areas like states. We also added variability to the outcome, with different amounts of variability by state to reflect units containing varying sample sizes.
s8_start <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen8",
col_types = c("text", "numeric", "text", "numeric", "text", "numeric"))
# we hid some code here from displaying in the html version of this file where
# we build up this scenario and add noise. Please view the Rmd document to see
# these lines of code!
Visualization of all comparisons made by TWFE regression
We start by running the linear model as we’ve done previously:
s8_mod <- lm(outcome ~ policy + factor(state) + factor(time),
data = all_states)
tidy(s8_mod)
## # A tibble: 54 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 7.70 1.55 4.98 8.06e- 7
## 2 policy1 10.1 0.806 12.5 2.59e-32
## 3 factor(state)11 0.126 1.46 0.0862 9.31e- 1
## 4 factor(state)12 -0.449 1.46 -0.307 7.59e- 1
## 5 factor(state)13 -1.04 1.46 -0.711 4.77e- 1
## 6 factor(state)14 -0.430 1.46 -0.294 7.69e- 1
## 7 factor(state)15 -0.0183 1.46 -0.0125 9.90e- 1
## 8 factor(state)20 0.713 1.46 0.488 6.26e- 1
## 9 factor(state)21 -1.75 1.46 -1.20 2.32e- 1
## 10 factor(state)22 0.668 1.46 0.457 6.48e- 1
## # … with 44 more rows
Up until now, we haven’t worried about the estimation of the 95%
confidence interval because our previous datasets didn’t contain any
noise. In practice, we want to estimate confidence intervals and to do
that correctly, we need to account for the clustering on data over time
within state. To do that we use the vcovHC function from
the sandwich package.
#install.packages("sandwich")
library(sandwich)
m2_var <- vcovHC(s8_mod)
We can then pull out the variance from the policy1 term
from the variance-covariance matrix and use it to compute Wald type
confidence intervals:
est = summary(s8_mod)$coefficients["policy1", "Estimate"]
lb = summary(s8_mod)$coefficients["policy1", "Estimate"] - 1.96*sqrt(m2_var["policy1","policy1"])
ub = summary(s8_mod)$coefficients["policy1", "Estimate"] + 1.96*sqrt(m2_var["policy1","policy1"])
The estimated effect is 10.1 with a 95% CI from 8.4 to 11.8.
We can use the Goodman Bacon decomposition to examine the contrasts contributing to the TWFE estimate.
The DID contrasts for the four comparisons are:
We can compare this to the output from the Goodman Bacon decomposition:
all_states %<>% mutate(policy_n = as.numeric(policy))
s8_bacon <- bacon(outcome ~ policy_n,
data = all_states,
id_var = "state",
time_var = "time")
## type weight avg_est
## 1 Earlier vs Later Treated 0.09929 9.48462
## 2 Later vs Earlier Treated 0.06383 -9.69867
## 3 Treated vs Untreated 0.83688 11.64355
s8_bacon
## treated untreated estimate weight type
## 2 15 99999 17.488998 0.45390071 Treated vs Untreated
## 3 22 99999 4.715617 0.38297872 Treated vs Untreated
## 6 22 15 -9.698670 0.06382979 Later vs Earlier Treated
## 8 15 22 9.484617 0.09929078 Earlier vs Later Treated
We can see that the estimates displayed in the second table are not too far off from the DID contrasts listed above. For this scenario, 84% of the weight is on the treated vs. untreated and about 10% of the weight is on the earlier vs. the later treated, with only 6% of the weight on the forbidden contrast. In the next section we will see how these weights change if pre-treatment time is shortened.
Compare to Callaway and Sant’Anna’s approach:
again I think the control group should be nevertreated. I don’t think it affects anything because the never treated are included as controls, but I’m not sure you necessary want to include the not yet treated as controls for this example.
s8_cs <- att_gt(yname = "outcome",
tname = "time",
idname = "state",
gname = "time_first_treat",
data = all_states,
control_group = "nevertreated",
anticipation = 0)
## Warning in att_gt(yname = "outcome", tname = "time", idname = "state", gname
## = "time_first_treat", : Not returning pre-test Wald statistic due to singular
## covariance matrix
s8_cs_ag <- aggte(s8_cs, type="simple")
summary(s8_cs_ag)
##
## Call:
## aggte(MP = s8_cs, type = "simple")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## ATT Std. Error [ 95% Conf. Int.]
## 13.0551 3.2585 6.6686 19.4417 *
##
##
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Never Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
lb = s8_cs_ag$overall.att - 1.96*s8_cs_ag$overall.se
ub = s8_cs_ag$overall.att + 1.96*s8_cs_ag$overall.se
round(c(s8_cs_ag$overall.att, lb, ub), 2)
## [1] 13.06 6.67 19.44
The CS estimate is 13.06 with 95% CI 6.67 to 19.44.
The simple ATT is a weighted average (based on the amount of time in the post-treatment period) of the estimated effects for the treated groups. To figure out what the group specific estimate are we can ask for the estimates by group:
s8_cs_ag2 <- aggte(s8_cs, type = "group")
summary(s8_cs_ag2)
##
## Call:
## aggte(MP = s8_cs, type = "group")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on group/cohort aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 10.7079 2.0139 6.7607 14.6552 *
##
##
## Group Effects:
## Group Estimate Std. Error [95% Simult. Conf. Band]
## 15 19.0909 3.0665 13.0117 25.1700 *
## 22 2.3249 2.3235 -2.2812 6.9311
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Never Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
And we can confirm how the simple weighted estimate was calculated:
s8_cs_ag2$att.egt[1]*(16/25) + s8_cs_ag2$att.egt[2]*(9/25)
## [1] 13.05514
NOTE: (Was expecting 18 and 5 as the group specific estimates – possible we are off just due to random error - need to rerun with a different seed or with less variability and see if that is true)
Yeah, I think it might be a random error issue. The confidence intervals cover 5 so I am not too worried.
Finally, we can look at these effect estimated separately for each time since the intervention began and graph them:
s8_cs_ag3 <- aggte(s8_cs, type="dynamic")
summary(s8_cs_ag3)
##
## Call:
## aggte(MP = s8_cs, type = "dynamic")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on event-study/dynamic aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 16.0589 2.7483 10.6724 21.4454 *
##
##
## Dynamic Effects:
## Event time Estimate Std. Error [95% Simult. Conf. Band]
## -20 -0.6142 2.3550 -6.7217 5.4933
## -19 -2.0235 2.3039 -7.9986 3.9516
## -18 2.5619 2.3552 -3.5461 8.6699
## -17 -1.1189 2.2475 -6.9476 4.7098
## -16 -2.7910 2.2857 -8.7189 3.1368
## -15 0.6003 1.3782 -2.9740 4.1745
## -14 2.6788 2.9659 -5.0132 10.3708
## -13 -0.5176 1.9962 -5.6946 4.6593
## -12 0.5020 2.7049 -6.5131 7.5172
## -11 -1.0797 2.3852 -7.2656 5.1061
## -10 1.2770 3.0526 -6.6398 9.1937
## -9 -1.1832 2.2441 -7.0031 4.6367
## -8 1.9790 2.2229 -3.7859 7.7438
## -7 1.4156 1.9984 -3.7672 6.5984
## -6 -2.3957 3.1920 -10.6741 5.8827
## -5 0.6336 1.9531 -4.4318 5.6990
## -4 -1.6087 2.9062 -9.1459 5.9285
## -3 -1.8200 1.9195 -6.7981 3.1580
## -2 1.4925 2.3177 -4.5183 7.5032
## -1 1.3875 2.0235 -3.8603 6.6352
## 0 2.4347 3.6385 -7.0017 11.8711
## 1 4.4729 3.0440 -3.4216 12.3674
## 2 5.2589 2.2307 -0.5264 11.0442
## 3 5.8989 2.7899 -1.3366 13.1344
## 4 6.2301 2.1866 0.5594 11.9008 *
## 5 6.3651 2.7060 -0.6527 13.3829
## 6 11.8416 3.0154 4.0214 19.6619 *
## 7 11.9494 3.2717 3.4645 20.4343 *
## 8 14.9849 2.7132 7.9483 22.0214 *
## 9 20.6918 6.6501 3.4451 37.9384 *
## 10 20.7157 2.0250 15.4640 25.9675 *
## 11 25.9967 3.4110 17.1504 34.8430 *
## 12 27.7174 4.1050 17.0714 38.3635 *
## 13 30.3829 3.5458 21.1872 39.5787 *
## 14 31.0607 2.6938 24.0745 38.0469 *
## 15 30.9403 4.9003 18.2316 43.6491 *
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Never Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
ggdid(s8_cs_ag3)
Now that we have confidence intervals, we can assess the parallel trends assumption by investigating if the pre-treatment confidence intervals overlap the null and are randomly distributed around the zero line. This is true in our example, so we wouldn’t be concerned about violating the parallel trends assumption with these data.
Now we can also calculate effects using the Cohort ATT estimator. We
couldn’t run this estimator for the other scenarios because the function
throws an error if it’s unable to estimate standard errors. To use the
staggered_sa function, we need to add a new variable where
the time_first_treat variable is set to Inf
for all states that were never-treated:
table(all_states$time_first_treat, useNA = "always")
##
## 0 15 22 <NA>
## 360 180 180 0
all_states$time_first_treat2 <- all_states$time_first_treat
all_states$time_first_treat2[all_states$time_first_treat == 0] <- Inf
#check that the recoding is as planned:
table(all_states$time_first_treat, all_states$time_first_treat2, useNA = "always")
##
## 15 22 Inf <NA>
## 0 0 0 360 0
## 15 180 0 0 0
## 22 0 180 0 0
## <NA> 0 0 0 0
For this function, we need to specify the data frame in the
df argument, the clustering unit in the i
argument, the time unit in the t argument. Here we use the
newly-defined time_first_treat2 as the grouping variable
g.
s8_sa <- staggered_sa(df = all_states,
i = "state",
t = "time",
g = "time_first_treat2",
y = "outcome",
estimand = "simple")
s8_sa$estimate
## [1] 13.05514
The estimate of the treatment effect is 13.0551383 and we can calculate its confidence interval using either of these:
#95% CI
round(c(s8_sa$estimate, s8_sa$estimate - 1.96*s8_sa$se, s8_sa$estimate + 1.96*s8_sa$se), 2)
## [1] 13.06 8.90 17.21
round(c(s8_sa$estimate, s8_sa$estimate - 1.96*s8_sa$se_neyman, s8_sa$estimate + 1.96*s8_sa$se_neyman), 2)
## [1] 13.06 8.71 17.40
For this example, the Group-Time estimates and Cohort ATT estimates are quite close and have overlapping confidence intervals. They’re both higher than the TWFE estimate because they do not use information based on the forbidden contrast.
We can also consider the Target trial estimator.
Their function relies on the date variable being a date object rather than a numeric index. Without loss of generality, we add dates to this dataset picking Jan 1, 2015 as the start date by way of exposition and incrementing time by one-month intervals:
all_states$time2 <- as.Date("2015-01-01") #date 0
all_states$period1 <- paste(all_states$time, " month")
all_states$time3 <- all_states$time2 %m+% period(all_states$period1) #new time variable
all_states %<>% select(-period1, -time2)
To use this estimator, the time_first_treat variable
needs to be set to missing for the untreated states for the function to
run:
all_states %<>% mutate(time_first_treat3 = case_when(time_first_treat2 == 15 ~ as.Date("2016-04-01"),
time_first_treat2 == 22 ~ as.Date("2016-11-01"),
is.infinite(time_first_treat2) ~ NA_Date_))
#check the coding
#table(all_states2$time_first_treat2, all_states2$time_first_treat3, useNA = "ifany")
s8_bm <- fit_event_jack(outcome_var = "outcome",
date_var = "time3",
unit_var = "state",
policy_var = "time_first_treat3",
data = all_states,
max_time_to = 10e7)
## Dropping 10
## Dropping 11
## Dropping 12
## Dropping 13
## Dropping 14
## Dropping 15
## Dropping 20
## Dropping 21
## Dropping 22
## Dropping 23
## Dropping 24
## Dropping 25
## Dropping 30
## Dropping 31
## Dropping 32
## Dropping 33
## Dropping 34
## Dropping 35
## Dropping 40
## Dropping 41
## Dropping 42
## Dropping 43
## Dropping 44
## Dropping 45
## `summarise()` has grouped output by 'cohort'. You can override using the
## `.groups` argument.
## Joining, by = c("cohort", "event_time")
s8_bm_ave <- s8_bm %>% filter(cohort == "average")
s8_bm_ave %<>% mutate(lb = estimate - 1.96*se,
ub = estimate + 1.96*se)
ggplot(s8_bm_ave, aes(x = event_time, y = estimate)) +
geom_hline(yintercept = 0, lty = 2) +
geom_point(aes(col = event_time >= 0)) +
geom_linerange(aes(ymin = lb, ymax = ub, col = event_time >= 0)) +
labs(y = "Estimate", x = "Event time") +
theme_bw() +
scale_color_discrete(labels=c('pre', 'post')) +
theme(legend.title=element_blank(), legend.position = "bottom")
In this scenario, we use the same data from Scenario 8, except we limit the pre-intervention point to 5 time units, rather than the 15 in Scenario 8. The amount of pre-intervention time should not affect the size of the effect estimate because the size of the change from the previous scenario is the same. The goal of this scenario is to see how the different estimators are affected by this change in the amount of pre-intervention time.
We start by limiting the time frame to begin at time t=10 or larger:
all_states2 <- all_states %>% filter(time >= 10)
And then we re-run the model:
s9_mod <- lm(outcome ~ policy + factor(state) + factor(time),
data = all_states2)
tidy(s9_mod)
## # A tibble: 45 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 33.9 1.82 18.7 3.35e-58
## 2 policy1 7.62 1.04 7.32 1.11e-12
## 3 factor(state)11 1.04 1.87 0.556 5.79e- 1
## 4 factor(state)12 -0.669 1.87 -0.358 7.21e- 1
## 5 factor(state)13 -1.18 1.87 -0.629 5.29e- 1
## 6 factor(state)14 -0.475 1.87 -0.254 8.00e- 1
## 7 factor(state)15 0.451 1.87 0.241 8.09e- 1
## 8 factor(state)20 0.261 1.87 0.140 8.89e- 1
## 9 factor(state)21 -1.88 1.87 -1.00 3.16e- 1
## 10 factor(state)22 0.397 1.87 0.212 8.32e- 1
## # … with 35 more rows
Again, we use the vcovHC function to estimate the SE and
obtain the 95% confidence interval for the policy coefficient:
m2_var <- vcovHC(s9_mod)
est2 = summary(s9_mod)$coefficients["policy1", "Estimate"]
lb2 = summary(s9_mod)$coefficients["policy1", "Estimate"] - 1.96*sqrt(m2_var["policy1","policy1"])
ub2 = summary(s9_mod)$coefficients["policy1", "Estimate"] + 1.96*sqrt(m2_var["policy1","policy1"])
The estimated effect is 7.6 with a 95% CI from 5.4 to 9.8. Let’s compare this with the estimate from Scenario 8:
| Estimate (95% CI) | |
|---|---|
| Scenario 8 | 10.1 (6.7,19.4) |
| Scenario 9 | 7.6 (5.4,9.8) |
The estimate for Scenario 9 is much lower than that for Scenario 8.
Let’s see how the weights from the Bacon decomposition for Scenario 9 compare to Scenario 8.
Scenario 9 weights:
all_states2 %<>% mutate(policy_n = as.numeric(policy))
s9_bacon <- bacon(outcome ~ policy_n,
data = all_states2,
id_var = "state",
time_var = "time")
## type weight avg_est
## 1 Earlier vs Later Treated 0.07384 9.78211
## 2 Later vs Earlier Treated 0.13291 -9.69867
## 3 Treated vs Untreated 0.79325 10.32571
s9_bacon
## treated untreated estimate weight type
## 2 15 99999 17.822890 0.33755274 Treated vs Untreated
## 3 22 99999 4.772245 0.45569620 Treated vs Untreated
## 6 22 15 -9.698670 0.13291139 Later vs Earlier Treated
## 8 15 22 9.782109 0.07383966 Earlier vs Later Treated
Scenario 8 weights:
s8_bacon
## treated untreated estimate weight type
## 2 15 99999 17.488998 0.45390071 Treated vs Untreated
## 3 22 99999 4.715617 0.38297872 Treated vs Untreated
## 6 22 15 -9.698670 0.06382979 Later vs Earlier Treated
## 8 15 22 9.484617 0.09929078 Earlier vs Later Treated
Comparing the weights:
## treated untreated estimate Scenario 8 weight Scenario 9 weight
## 1 15 99999 17.82 0.45 0.34
## 2 22 99999 4.77 0.38 0.46
## 3 22 15 -9.70 0.06 0.13
## 4 15 22 9.78 0.10 0.07
## Weight difference type
## 1 -0.12 Treated vs Untreated
## 2 0.07 Treated vs Untreated
## 3 0.07 Later vs Earlier Treated
## 4 -0.03 Earlier vs Later Treated
More weight is put on the states treated at time=22 in Scenario 9 vs. 8 and less weight on the states treated at time=15. This is because the TWFE estimator is a weighted combination of contrasts, where the weights are partially determined by how close the contrast is to the middle of the panel. More weight is put on the estimates for groups with changes closer to the middle of the panel. For Scenario 8, the states with treatment changes at time 15 are exactly in the middle of the panel (which had 30 time units), but for Scenario 9, with a shortened pre-treatment period, less weight is put on this group as they are further away from the middle of the panel and more weight is put on the group that introduced treatment at time point 22 because this becomes closer to the middle.
This weighting does not likely correspond to any weighting that the researcher would choose. Concerning!
Let’s see how the other estimators are affected by this change:
s9_cs <- att_gt(yname = "outcome",
tname = "time",
idname = "state",
gname = "time_first_treat",
data = all_states2,
control_group = "notyettreated",
anticipation = 0)
s9_cs_ag <- aggte(s9_cs, type="simple")
summary(s9_cs_ag)
##
## Call:
## aggte(MP = s9_cs, type = "simple")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## ATT Std. Error [ 95% Conf. Int.]
## 12.9667 2.9875 7.1112 18.8222 *
##
##
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
The Callaway Sant’Anna estimate of the ATT is the same as for Scenario 9 (with a slightly lower SE).
What about the Cohort ATT estimate?
s9_sa <- staggered_sa(df = all_states2,
i = "state",
t = "time",
g = "time_first_treat2",
y = "outcome",
estimand = "simple")
s9_sa$estimate
## [1] 13.05514
round(c(s8_sa$estimate, s8_sa$estimate - 1.96*s8_sa$se, s8_sa$estimate + 1.96*s8_sa$se), 2)
## [1] 13.06 8.90 17.21
round(c(s8_sa$estimate, s8_sa$estimate - 1.96*s8_sa$se_neyman, s8_sa$estimate + 1.96*s8_sa$se_neyman), 2)
## [1] 13.06 8.71 17.40
Like the Group-Time ATT estimate, the Cohort ATT estimate stays exactly the same.
Lastly, what happens to the Target Trail estimates? We compare them below:
s8_bm_hte <- fit_event_jack_sum_hte(outcome_var = "outcome",
date_var = "time3",
unit_var = "state",
policy_var = "time_first_treat3",
data = all_states,
max_time_to = 10e7)
## Dropping 10
## Dropping 11
## Dropping 12
## Dropping 13
## Dropping 14
## Dropping 15
## Dropping 20
## Dropping 21
## Dropping 22
## Dropping 23
## Dropping 24
## Dropping 25
## Dropping 30
## Dropping 31
## Dropping 32
## Dropping 33
## Dropping 34
## Dropping 35
## Dropping 40
## Dropping 41
## Dropping 42
## Dropping 43
## Dropping 44
## Dropping 45
s8_bm_hte %<>% mutate(lb = estimate - 1.96*se,
ub = estimate + 1.96*se)
s8_bm_hte
## se estimate lb ub
## 1 3.243503 10.70791 4.35064 17.06517
s9_bm_hte <- fit_event_jack_sum_hte(outcome_var = "outcome",
date_var = "time3",
unit_var = "state",
policy_var = "time_first_treat3",
data = all_states2,
max_time_to = 10e7)
## Dropping 10
## Dropping 11
## Dropping 12
## Dropping 13
## Dropping 14
## Dropping 15
## Dropping 20
## Dropping 21
## Dropping 22
## Dropping 23
## Dropping 24
## Dropping 25
## Dropping 30
## Dropping 31
## Dropping 32
## Dropping 33
## Dropping 34
## Dropping 35
## Dropping 40
## Dropping 41
## Dropping 42
## Dropping 43
## Dropping 44
## Dropping 45
s9_bm_hte %<>% mutate(lb = estimate - 1.96*se,
ub = estimate + 1.96*se)
s9_bm_hte
## se estimate lb ub
## 1 3.243503 10.70791 4.35064 17.06517
The Target Trial estimates stay exactly the same as well.
Takeway: The TWFE estimate is affected by the length of the panel. It places more weight on contrasts that are closer to the middle of the panel. The newer estimators do not have this issue.